[PerfXLab] optimize sqrt op performance by bin913 · Pull Request #2217 · flagos-ai/FlagGems

bin913 · 2026-04-02T09:03:39Z

PR Category

[ Operator]

Type of Change

[Performance Optimization]

Description

optimize sqrt op formance

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

test_unary_pointwise_perf.py::test_general_unary_pointwise_perf[sqrt-sqrt-dtypes11] 
Operator: sqrt  Performance Test (dtype=torch.float16, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup               TFLOPS          Size Detail
--------------------------------------------------------------------------------------------------------------------
SUCCESS               1.423072            1.410560               1.009               0.761          [torch.Size([1073741824])]
SUCCESS               0.006176            0.005440               1.135               0.001          [torch.Size([64, 64])]
SUCCESS               0.028768            0.028000               1.027               0.599          [torch.Size([4096, 4096])]
SUCCESS               0.029184            0.027920               1.045               0.601          [torch.Size([64, 512, 512])]
SUCCESS               1.421040            1.411200               1.007               0.761          [torch.Size([1024, 1024, 1024])]


Operator: sqrt  Performance Test (dtype=torch.float32, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup               TFLOPS          Size Detail
--------------------------------------------------------------------------------------------------------------------
SUCCESS               2.826640            2.824240               1.001               0.380          [torch.Size([1073741824])]
SUCCESS               0.006208            0.005504               1.128               0.001          [torch.Size([64, 64])]
SUCCESS               0.050784            0.050144               1.013               0.335          [torch.Size([4096, 4096])]
SUCCESS               0.050720            0.050080               1.013               0.335          [torch.Size([64, 512, 512])]
SUCCESS               2.828192            2.825664               1.001               0.380          [torch.Size([1024, 1024, 1024])]


Operator: sqrt  Performance Test (dtype=torch.bfloat16, mode=kernel,level=core)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup               TFLOPS          Size Detail
--------------------------------------------------------------------------------------------------------------------
SUCCESS               1.427072            1.411952               1.011               0.760          [torch.Size([1073741824])]
SUCCESS               0.006176            0.005408               1.142               0.001          [torch.Size([64, 64])]
SUCCESS               0.028768            0.027808               1.035               0.603          [torch.Size([4096, 4096])]
SUCCESS               0.029216            0.027840               1.049               0.603          [torch.Size([64, 512, 512])]
SUCCESS               1.422864            1.411168               1.008               0.761          [torch.Size([1024, 1024, 1024])]

you-and-you · 2026-04-03T07:17:21Z

src/flag_gems/runtime/backend/_nvidia/hopper/ops/sqrt.py

+        triton.Config({"BLOCK_SIZE": 2048}, num_stages=4, num_warps=1),
+    ],
+    key=["n_elements"],
+)


Please transfer the autotune optimization configurations to src/flag_gems/runtime/backend/_nvidia/hopper/tune_configs.yaml.

opt sqrt

8f8cb93

github-actions bot added vendor/NVIDIA size/Small labels Apr 2, 2026

huangyiqun self-assigned this Apr 3, 2026

you-and-you reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PerfXLab] optimize sqrt op performance#2217

[PerfXLab] optimize sqrt op performance#2217
bin913 wants to merge 1 commit intoflagos-ai:masterfrom
bin913:sqrt

bin913 commented Apr 2, 2026

Uh oh!

you-and-you Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bin913 commented Apr 2, 2026

PR Category

Type of Change

Description

Issue

Progress

Performance

Uh oh!

you-and-you Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants